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Abstract. Given a prime q and a negative discriminant D, the CM method 
constructs an elliptic curve E/F q by obtaining a root of the Hilbert class 
polynomial Ho(X) modulo q. We consider an approach based on a decompo- 
sition of the ring class field defined by Ho, which we adapt to a CRT setting. 
This yields two algorithms, each of which obtains a root of Ho mod q without 
necessarily computing any of its coefficients. Heuristically, our approach uses 
asymptotically less time and space than the standard CM method for almost 
all D. Under the GRH, and reasonable assumptions about the size of logg 
relative to \D\, we achieve a space complexity of 0((m + n) log q) bits, where 
mn = h(D), which may be as small as 0(|D| 1//4 log q). The practical efficiency 
of the algorithms is demonstrated using \D\ > 10 16 and q « 2 256 , and also 
\D\ > 10 15 and q a 2 33220 . These examples are both an order of magnitude 
larger than the best previous results obtained with the CM method. 



1. Introduction 

The CM method is a widely used technique for constructing elliptic curves over 
finite fields. To illustrate, let us construct an elliptic curve E/F q with exactly N 
points. We shall assume that q is prime, and require t = q + 1 — TV to be nonzero 
and satisfy \t\ < 2Jq. We may write Aq = i 2 — v 2 D, for some nonzero integer v and 
negative discriminant D, and then proceed as follows: 

1. Compute the Hilbert class polynomial Ho £ Z[Jf]. 

2. Find a root x of Ho(X) modulo q. 

The root x is the j-invariant of an elliptic curve E with #E(F q ) — N; an explicit 
equation for E can be obtained via [37]. The endomorphism ring End(-E) is isomor- 
phic to the imaginary quadratic order O with discriminant D, and we say that E 
has complex multiplication (CM) by O. 

In principle, the CM method can construct any ordinary elliptic curve E/F q . 
In practice, it is feasible only when \D\ is fairly small. The main difficulty lies in 
step 1. The Hilbert class polynomial is notoriously large, as may be seen below: 
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152 KB 
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465872 
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10 8 4 
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239 MB 


10 14 - 
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384 TB 


10 9 4 


- 15 


15216 


2.73 GB 


10 15 - 


M5 


14635920 


4.45 PB 


10 10 


+ 47 


48720 


31.4 GB 


10 16 - 


\- 135 


46275182 


47.2 PB 



The value h(D) is the class number of D, which is the degree of Ho- The size 
listed is an upper bound on the total size of Ho derived from known bounds on its 
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largest coefficient [42, Lemma 8], and is generally accurate to within ten percent. 
These discriminants were chosen so that the ratio h(D)/ y/\D\ is within ten percent 
of its asymptotic average (0.461559 . . .), so they represent typical examples. 

There are at least three different ways to compute Ho- the complex analytic 
method [3, 21, 26], a p-adic approach [12, 18], or by computing Ho modulo many 
small primes and applying the Chinese remainder theorem (CRT) [1, 7, 15]. Under 
suitable heuristic assumptions all three approaches can achieve quasi-linear running 
times: 0(|-D| log c |D|) for some constant c. However, the 0(|-D| log 1+e \D\) space 
needed to compute Ho makes it difficult to apply these algorithms when |D| is 
large. As noted in [21], space is typically the limiting factor. 

Here we build on the CRT approach of [42] , which gives a probabilistic (Las Ve- 
gas) algorithm to compute Ho, with an expected running time of 0(|£»|log 5+e |£>|) 
under the generalized Riemann hypothesis (GRH), and a heuristic running time 
of 0(\D\ log 3+£ \D\). Most critically for the CM method, it directly computes 
Hq mod q without computing Ho over Z. This yields a space complexity of 
0(|£)| 1 / 2+£ logg), allowing it to handle much larger values of \D\. 

As a practical optimization, the CM method may use alternative class polyno- 
mials that are smaller than Ho by a large constant factor. The algorithm in [42] 
has recently been adapted to compute such class polynomials [24] . To simplify our 
presentation we focus on the Hilbcrt class polynomial Ho, but our results apply to 
all the class polynomials considered in [24], a feature we exploit in §6. 

When h(D) is composite, a root of Ho can be obtained via a decomposition of 
the field extension defined by Ho{X), as described in [23, 28]. The algorithm in [23] 
computes integer polynomials that describe this decomposition, which can be used 
in place of Ho- This allows a single root-finding operation to be replaced by two 
root-finding operations of smaller degree, speeding up step 2 of the CM method. 
We adapt this technique to a CRT setting, where we find it also accelerates step 1, 
which is the asymptotically dominant step as a function of \D\. 

Provided h(D) is sufficiently composite, we may choose a decomposition that 
significantly reduces the size of the coefficients in the defining polynomials. In order 
to do so, we derive an explicit height bound that can be efficiently computed for each 
of the possible decompositions available. By choosing the optimal decomposition we 
gain nearly a log \ D\ factor in the running time, on average, based on the heuristic 
analysis in §5.4. This claim is supported by empirical data, and we give practical 
examples that achieve more than a tenfold speedup. 

We are also able to improve the space complexity of the CM method. 

Proposition. Assume the GRH, and fix real constants 8 > and e > 0. Let O 

be an imaginary quadratic order with discriminant D and class number h = run, 
with m < OdD] 1 / 2 ^ 6 ). Let q be a prime of the form 4g = t 2 — v 2 D, and as- 
sume log q = 0(\D\ S log \D\). An elliptic curve E/F q with End(E) = O can be 
constructed in 0{\D\ log 6+e \D\) expected time using 0{[m + n) logg) space. 

This is achieved by interleaving the computation of the defining polynomials 
modulo many small primes p with root-finding operations modulo q. If we addi- 
tionally require m ~ Q(\D\ 1/ ' 2 ~''), for some 7 > 6, this yields an 0(|Z)| 1 / 4+7 logg) 
space bound, improving the OdD] 1 / 2 ^ log q) result in [42]. 



We assume throughout that fast probabilistic methods arc used to find roots of polynomials 
over finite fields. 
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The organization of the paper is as follows. We begin with some necessary prepa- 
ration in §2, and then present two algorithms to obtain a root of the Hilbert (or 
other) class polynomial modulo a prime q in §3 and §4. The optimization of the 
height bound is addressed in §5. Finally, we present performance data in §6, includ- 
ing the construction of an elliptic curve over a 256-bit prime field with \D\ > 10 16 , 
and an elliptic curve over a 10000-digit prime field with \D\ > 10 15 , both of which 
are new records for the CM method. 

2. Background 

2.1. Hilbert class polynomials. We first recall some facts from the theory of 
complex multiplication, referring to [19, 34, 40] for proofs and further background. 
Let O be an imaginary quadratic order, identified by its discriminant D. The j- 
invariant of the lattice O is an algebraic integer whose minimal polynomial is the 
Hilbert class polynomial Hd- If a is an invertible O-ideal (including a = O), then 
the torus C/o corresponds to an elliptic curve E/C with CM by 0, and every such 
curve arises in this fashion. Equivalent ideals yield isomorphic elliptic curves, and 
this gives a bijection between the ideal class group cl(O) and the set 

EUc(C) = {j(E/C): End(£) £* O}, 

the j-invariants of the elliptic curves defined over C with CM by O. We then have 

(1) H D {X)= H {X-ji). 

ii€EU (C) 

The splitting field of Hd over K = Q(VD) is the ring class field Kq- It is an 
abelian extension whose Galois group is isomorphic to cl(O), via the Artin map. 

This isomorphism can be made explicit via isogenics. Let E/C be an elliptic 
curve with CM by O and let a be an invertible O-ideal. After fixing an isomorphism 
End(_E) = O, there is a uniquely determined separable isogeny whose kernel is the 
group of points annihilated by every endomorphism in a C O = End(E). The image 
of this isogeny also has CM by 0, and this defines an action of the ideal group of O 
on the set Ello(C). Principal ideals act trivially, and the induced action of the 
class group is regular. Thus the set Elle>(C) is a principal homogeneous space, a 
torsor, for the group cl(O). For a j-invariant ji in Ello(C) and an ideal class [a] in 
cl(O), we write [a]ji to denote the image of ji under the action of [a]. 

If p is a prime that splits completely in Kq, cquivalcntly (for p > 3), a prime 
that satisfies the norm equation 

(2) 4p = t 2 - v 2 D, 

for some nonzero integers t and v, then Hd splits completely in F p LY] and its 
roots form the set Elle>(F p ). Conversely, every ordinary (not supersingular) elliptic 
curve E/F p has CM by some imaginary quadratic order O in which the Frobenius 
endomorphism corresponds to an element of norm p and trace t. 

2.2. Computing Hd with the CRT method. The above theory suggests the 
following algorithm to compute H D modulo a prime p that splits completely in Kq: 

1. Find an elliptic curve E with ^'-invariant ji € Ello(F p ). 

2. Enumerate E11 (F P ) as {[a]ji : [a] e cl(O)} = {ji, . . . ,j h }- 

3. Compute H D (X) mod p as (X - ji){X - j 2 ) ■ ■ ■ (X - j h ). 
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Step 1 essentially involves testing curves at random, so this algorithm is feasible 
only when p is fairly small, but see [42, §3] for various ways in which this step may 
be optimized. Step 2 is addressed in §2.6. Step 3 is just polynomial arithmetic, but 
this is usually the most expensive step [42, §7.1]. 

Under the GRH we can we always work with primes p that are roughly the 
same size as \D\, no matter how big q is. We select a sufficiently large set of such 
small primes that split completely in Kq, and compute Hd mod p for each of them. 
Provided the product of these primes is larger than 2B, where B bounds the coeffi- 
cients of Ho , the polynomial Hp is uniquely determined by the Chinese Remainder 
Theorem (CRT). Explicit values for B are given in [22] and [42, Lemma 8]. 

However, to compute Hd mod q in 0(|Z)| 1 / 2+£ log q) space, one cannot simply 
compute Hd over Z and then reduce it modulo q, since writing down Hd would 
already take too much space. Instead, one may use the explicit CRT (mod q) [9, 10], 
applying it in an online fashion to accumulate results modulo q that are updated 
after each computation of Hd mod p, as described in [42, §6] and summarized in §2.3 
below. Here we also use the explicit CRT, but to obtain a root of H D mod q with 
a space complexity that may be as small as 0(|Z)| 1//4+£ logg), we must even avoid 
writing down Hd mod q. This requires some new techniques that are described 
in §4.3.2. 

2.3. Explicit CRT. Let pi, . . . ,p n be primes with product M, let Mj = M/pi, and 
let aiMi = 1 mod p%. If c G Z satisfies c = Cj mod pi, then c = ^ CiaiMi mod M. 
If M > 2|c|, this congruence uniquely determines c. This is the usual CRT method. 

Now suppose M > 4|c| and let q be a prime (or any integer, in fact). Then we 
may apply the explicit CRT mod q [10, Thm. 3.1] to compute 



where r is the closest integer to *Ylii c i a ilVi'; when computing r, it suffices to ap- 
proximate each Cidi/pi to within l/(4n), by [10, Thm. 2.2]. 

When applying the explicit CRT to compute many values c mod q, say, the 
coefficients of a polynomial, one first computes the ai (mod pi) and M, (mod q) 
using a product tree as described in [42, §6.1]; this is CRT precomputation step, 
and it does not depend on the coefficients c. Then, as the reduced coefficient values 
Cj = c mod pi are computed for a particular pi, the sum ^ CidiMi mod q and an 
approximation to ^ Cidijpi are updated for each coefficient, after which the Cj may 
be discarded; this is the CRT update step. Finally, when these sums have been 
updated for every prime pi, one applies (3) to obtain c mod q for each coefficient, 
using the sums ^CiCiiMi and the approximations r rts J2 c i a i/Pii tms ^ the CRT 
postcomputation step. For further details, including explicit algorithms for each 
step, see [42, §6.2] 

2.4. Assuming the GRH. The Chebotarev density theorem guarantees that a 
set of primes S sufficient to compute Hd with the CRT method exists. We even 
have effective bounds on their size [33], but these are too large for our purpose. 
To obtain better bounds, we assume the GRH (for the Dedekind zeta function 
of Kq), which implies that the primes in S are no more than a poly logarithmic 
factor larger than \D\, see [42, Lemma 3]. Having made this assumption, we are 
then in a position to apply other bounds that depend on some instance of the 
extended or generalized Riemann hypothesis, all of which we take to be included 
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when we assume the GRH without qualification. In particular, we make frequent 
use of the bound h(D) = O^D] 1 / 2 log log \D\), proven in [35], as well as the bound 

(4) £ ] ^- T =0(Iog|D|logIog|£>|) ) 

where a is the invertible ideal of least norm in [a] , from [7, Lemma 2] . The sum 
in (4) may also be written as ^ where (Ai,Bi, d) ranges over the primitive 

reduced binary quadratic forms AiX 2 +BiXy+Ciy 2 of discriminant D = B 2 — AAiCi. 

2.5. Decomposing the class equation. We now describe an explicit method for 
"decomposing" a polynomial via a decomposition of the field extension it defines. 
This is based on material in [23] and [28] that we adapt to our purpose here. 

Let K be a number field, and let P e Z[X] be a monic polynomial, irreducible 
over K, with splitting field M. We have in mind K = Q(v A D), P = Ho, and 
M = Kq. We shall assume the action of Gal(M/K) is regular, equivalently, that 
[M :K] — deg P, which holds in the case of interest. 

Given a normal tower of fields K C L C M , we may decompose the extension 
M/K into extensions M/L and L/K via polynomials U e Z{Y)[X] and V e Z[X], 
where P(x) = if and only if U(x, y) = and V(y) = for some y e L. The 
polynomial V(Y) defines the extension L/K, and for any root y of V, the polynomial 
U(X,y) defines the extension M/L. In our application L will be identified as the 
fixed field of a given normal subgroup G of Gal(M/K). 

Let us fix a root x of P, and let jx denote the conjugate of x under the action 
of 7 e Gal(M/K). Let f3\, . . . , (3 n be the elements of G, and let ot\G, . . . , a m G be 
the cosets of G in G&\{M/K). We may factor P in L[X] as P = P\ ■ ■ ■ P m , where 

n n 

(5) Pi(X)= HiX-atpkx) =Y.°^ xk - 

k=l k=0 

Now let us pick a symmetric function s in Z[Ti, . . . ,T n ] for which each of the 
m values yi = s(ctif3ix, . . . ,oei/3 n x) are distinct, equivalently, for which L = K(yi); 
Lemma 1 below shows that this is easily achieved. We then define the polynomial 

m 

(6) V(Y) = Y[(Y- yi ). 

i=i 

The coefficients of V lie in Z, since each may be expressed as a symmetric integer 
polynomial in the roots of the monic polynomial P € Z[X]. For < k < n let 

(7) «W)-E%^5- 

As with V, we have Wk € Z[y]. Note that Wk(Y) is the unique polynomial of 
degree less than m for which Wk(yi) — OikV'(yi), by the Lagrange interpolation 
formula. This definition of the Wk is referred to as the Hecke representation in [23] . 
Finally, let 

(8) U ( X ^ Y ) = V ^r ) T,Wk(Y)X k . 

^ ' k=0 

For each root y, of V(Y) we then have U(X, yi) = Pi(X), with V'(yi) ^ 0, since V 
has distinct roots. Each root of U(X, yt) is a root of P(X), and every root of P{X) 
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may be obtained in this way. Notice that this construction does not require us to 
know the coefficients of P, but we must be able enumerate the G-orbits of its roots. 

As noted in [28], for any given K, P, and G, there are only finitely many linear 
combinations of the elementary symmetric functions, up to scalar factor, that do 
not yield a symmetric function s suitable for the construction above. More precisely, 
we have the following lemma. 

Lemma 1. Let M = K(x) be a finite Galois extension of a number field K , let G be 
a normal subgroup of G&l(M / K ) with order n and fixed field L, and let m = [L : K]. 
Let x = (xi, . . . ,x n ) denote the G -orbit of x, and lete\,...,e n denote the elemen- 
tary symmetric functions on n variables. There are at most m n ~ l {nn — l) n_1 linear 
combinations of the form s = e\ + c 2 e 2 + • • • + c n e n , with c 2 , . . . , c„ € K, for which 
L^K(s(x)). ' 

Proof. Let z k = e k (x) and Lk = K{z\, . . . , z k ). We have L\ C • • • C L n = L, where 
the last equality follows from the fact that a; is a root of the monic polynomial 
X n + Yl=i{- 1 ) ke kX n - k in L n [X], since [L(x):L] = n. From the proof of the 
primitive element theorem in [44, §6.10], we know that K (z\, z 2 ) — K{z\+c 2 z 2 ) for 
all C2 G K not of the form (iii — ui)/(vj — with j > 1, where z\ — u\, u 2 , ■ ■ ■ , u r 
are conjugates in L/K, and z 2 — vi,v 2 ,...,v s are conjugates in L/K. Since 
r,s < to, there can be at most to(to — 1) such c 2 . The same argument shows 
that K(zi + c 2 z 2 + • • • + Ck-iZk-i,Zk) = K{z\ + c 2 z 2 + • • • + c k z k ) for all but at 
most to(to — 1) values of Ck £ K, and the lemma follows. □ 

In the construction above, if L = K(s(x)) holds for any G-orbit x, then it 
necessarily holds for every G-orbit contained in the same Gal(M/if )-orbit, and in 
this case the s-images yi of these G-orbits are distinct, since they are G&\(L/K)- 
conjugatcs. To find such an s explicitly, we pick random integer coefficients c 2 , . . . , c„ 
uniformly distributed over [0,2to 2 — 1], and let s = e\ + c 2 e 2 + • • • + c k e k . By 
Lemma 1, we then have L = K(s(x)) with probability at least 1 — 2 1_ ™. In the 
event that L ^ K(s(x)), we simply pick a new set of random coefficients and repeat 
until we succeed, yielding a Las Vegas algorithm. In the trivial case, n = 1 and we 
succeed on the first try with s = e\\ for n > 1 we expect to succeed with at most 
1/(1 — 2 1_ ") < 2 attempts, on average. 

2.6. Orbit enumeration. Recall that EUe>(F p ) is a torsor for cl(O). Each sub- 
group G of cl(O) partitions Elle>(F p ) into G-orbits that correspond to cosets of G. 
Given ji € Eile>(F p ), the set Elle>(F p ) may be enumerated via [42, Alg. 1.3], but 
to correctly identify its G-orbits we require some further refinements. 

As in [42, §5], we represent cl(O) using a polycyclic presentation defined by 
a sequence of ideals Ii, . . . , lk of prime norms whose classes generate cl(O). The 
relative order of [li] is the least positive integer for which [C[ s ] € ([Ii], • • ■ , [li-i])- 
Every element [o] of cl(O) may be uniquely written in the form 

with < ei < ri. Having fixed a polycyclic presentation, we may enumerate cl(O) 
by enumerating the corresponding exponent vectors (ei, . . . , e/j). Moreover, for any 
subgroup G of cl(O), we can easily identify the exponent vectors corresponding to 
each coset of G. This allows us to distinguish the G-orbits of Elle>(F p ), provided 
that we can unambiguously compute the actions of [Ii],. . . ,[[&]. 
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Let ji — j(Ei) be an element of G EUe>(F p ) and let [ be an invertible O-ideal of 
prime norm t ^ p. Then j% = [l]ji is the j-invariant of an elliptic curve E2 that is 
£-isogenous to E±. We may obtain j% G F p as a root of 4>{X) = &e(ji,X), where 
&n G ZLY. 1"] is the classical modular polynomial [46, §69] that parameterizes cyclic 
isogenies of degree N. When [I] has order 2, there is just one distinct root of <p(X) 
that lies in Elle>(F p ), but in general there will be two: and [Qjt. Typically these 
are the only roots of <j>{X) in F p , and when they are not, they may be distinguished 
as the roots that lie on the surface of an i- volcano, see [42, §4] or [32]. The only 
ambiguity lies in distinguishing the actions of [[] and [I], which correspond to the 
two directions we may walk along the cycle of ^-isogenies on the surface. 

To enumerate Elle>(F p ) it is not necessary to distinguish these directions, as 
shown in [42, Prop. 5]. However, it is necessary, in general, if we wish to identify 
the G-orbits of Ello(F p ) in this enumeration. To see why, let cl(O) be a cyclic 
group of order 6 generated by [a], and suppose our polycyclic presentation has 
[li] = [a 2 ] with r\ = 3, and [[2] = [0] with = 2. Below are four of the eight 
possible enumerations of Elle>(F p ) that might be produced by Algorithm 1.3 of [42] 
in this scenario: 




Each node corresponds to a j-invariant [a e ]ji and is labeled by the exponent e. 
The arrows indicate the action of the ideal that appears in the label, and bold arrows 
indicate where an arbitrary choice was made. The two lightly shaded nodes in each 
configuration correspond to the expected positions of the elements of Elle>(F p ) 
corresponding to the subgroup G = (a 3 ) of order 2. The two configurations on the 
right yield a correct enumeration of each G-orbit, but the two on the left do not. 

Remark 1. Of course we could have used a polycyclic presentation with [h] = [a] 
and n = 6 in this example, but unless [a] happens to contain the invertible O-ideal 
of least prime norm, this is suboptimal, since the cost of computing the action of [I] 
increases quadratically with the norm of [. The optimal polycyclic presentation 
rarely has one generator, even when cl(O) is cyclic. 

We now consider ways to enumerate Elle>(F p ) that allow us to identify its G- 
orbits. First, the GCD technique of [24, §2.3] may be used to consistently choose 
li or ^ each time a new path of £i-isogenies is begun (this actually speeds up the 
enumeration, so it should be done in any case). We are then left with just k choices, 
one for each U. To consistently orient these choices we may either: (a) use auxiliary 
relations [c^] = [b ■ • • li], where has (small) prime norm different from 
as described in [24, §4.3], or (b) compute the action of Frobenius on the kernel of 
the two isogenies, as discussed in [13, §4] and the references therein. Option (a) is 
fast and easy to implement, but for the best space complexity we should use (b), 
since it does not require us to store the entire enumeration. A minor drawback to 
option (b) is that it requires ti \ v, where v is as in the norm equation (2). 

However, in many cases neither (a) nor (b) is necessary. If G is of the form 

(9) G = ([ti], . . . , [U-i], HI]) 

for some d < k and e\rd, then it follows from the proof of [42, Prop. 5] that any 
enumeration of Elle>(F p ) output by [42, Alg. 1.3] also gives a correct enumeration of 
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the G-orbits of Elle>(F p ). In the context of the CM method, we are free to choose G, 
and we can always choose one that satisfies (9). This may limit our choices for G, 
but in practice this restriction is usually not a burden. 

3. A FIRST ALGORITHM 

Our first algorithm is a direct implementation of the theory presented in §2, 
which can significantly accelerate the CM method in the typical case where the 
class number h(D) is composite. At this stage we shall not be concerned with 
improving space complexity; this will be the focus of our second algorithm. Both 
algorithms are probabilistic (of Las Vegas type) . 

As above, O is an imaginary quadratic order with discriminant D, and we shall 
assume h = h(D) > 1. Let Vd be the set of primes that split completely in Kq, 
equivalently, primes that satisfy (2). Given D and a prime q £ Vd wc wish to obtain 
a root of the Hilbert class polynomial Hd over F q . Such a root is the j-invariant 
of an elliptic curve E/F q with CM by 0, as required by the CM method. 

The first step is to choose a subgroup G of cl(O), which determines n = \G\ and 
m = h/n. The quantities Oik and yi, V(Y), Wk{Y), and U(X, Y), are then defined 
as in §2.5, with 1 < i < m and < k < n (we do not need k = n). 

The Hi (and therefore V(Y)) depend on the choice of a function s G Z[Xi, . . . , X n ] 
that is a randomly chosen linear combination of the elementary symmetric functions 
ei, . . . , e„, as described at the end of §5. In the unlikely event that we pick a bad s, 
we may find that V is a perfect power. In this case the algorithm simply repeats 
the computation with a new choice of s. 

We also require a bound B on the coefficients of V and Wk- The computation 
of B is addressed in §5. We now give the algorithm. 

Algorithm 1. Given D and q € Vd, find o, root x of Hd in F q as follows: 

1. Select a subgroup G of cl(O) with 2|G| 2 < q and let n = \G\. 

2. Generate random integers c 2 , . . . , c„ uniformly distributed over [0, 2m 2 — 1] 
and set s = e± + c 2 e 2 H h c n e n . 

3. Compute a bound B as described in §5. 

4. As in steps 1-3 of [42, Alg. 2], use B to select S c Vd, compute a polycyclic 
presentation T for cl(O), and perform CRT precomputation (see §2.3). 

5. For each p e S: 

a. Find j x e E11 (F P ) as in [42, Alg. 1]. 

b. Enumerate the G-orbits Gj of Elle>(F p ) using j\ and T. 

c. Compute the Oik and the t/j modp (using s), as described in §2.5 

d. Compute V and the Wk mod p. 

e. Update CRT data for the coefficients of V and the Wk- 

6. Perform CRT postcomputation to obtain V and the Wk mod q. 

7. Working in F q , find a root y of V that is not a root of V'. 
If no such root exists then return to step 2. 

8. Compute U V (X) = U(X, y) mod q and output a root x of U y in F q . 

It follows from §2.5 that the output value x is a root of Hd in F q , thus the 
algorithm is correct. Lemma 1 guarantees that it always terminates, and that its 
expected running time is no more than twice the expected time when the first choice 
of s works (this factor approaches 1 as n increases). 
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Remark 2. For practical implementation one may prefer to fix s — e\ (or s = 
and instead change the choice of G if s = ei does not work. Using s = ei simplifies 
the implementation and can reduce the bound B significantly. Empirically, for 
large values of \D\ and q, using s — e\ is very likely to work with every choice of G. 
Alternatively, one may start with s = e\ and then switch to a random s if necessary. 
In all our examples, including all the computations in §6, and in the heuristic 
analysis of §5.4, we use s = ±ei, but our mathematical results (Propositions 1-3) 
all assume a random s, as specified in Algorithm 1. 

We note three immediate generalizations of Algorithm 1. First, other class in- 
variants may be treated with suitable modifications to step 2, see [24]. Second, it is 
not necessary for q to be prime; q may be a prime power q$ satisfying 4q = t 2 — v 2 D 
with t ^ mod qo- One then computes V and the Wk mod qo, and performs the 
root-finding operations in F q . Third, we can compute V and the Wk over Z using 
the standard CRT: replace q by the product of the primes in S and lift the results of 
step 4 to Z. Steps 7 and 8 may then later be applied to any q that splits completely 
in the ring class field. 

3.1. Example. Let us find a root of Hp mod q using Algorithm 1, with D — —971 
and q — 1029167. The class group is cyclic of order 15, and the optimal polycyclic 
presentation 2 has norms i\ — 3 and ^2 = 5 and relative orders r\ = 5 and r 2 = 3. 
We choose G to be the subgroup of order n — 5, which is conveniently of the 
form (9), so we need not distinguish directions when enumerating Elle>(F p ). For 
convenience we set s = —e±, so that yi = 9i. n -i)- 

Computing the bound B as in §5, we have b — log 2 B 340 bits, and select a 
set of primes in Vd whose product exceeds AB. As described in [42, §3], we choose 
primes that optimize the search for j\ G Elle>(F p ), obtaining a set of 27 primes: 

S = {263, 353, 1871, . . . , 38677, 43237, 62873}. 

We then precompute parameters for the explicit CRT (mod q), using [42, Alg 2.3]. 
For p = 263 we find j-y = 252, and enumerate Elle>(F p ) from j\ as: 




The horizontal arrows denote 3-isogenies and the vertical arrows are 5-isogenies. 
The three G-orbits of Ell (F p ) are {252, 38, 151, 121, 258}, {70, 112, 182, 198, 140}, 
and {202, 130, 183, 196, 136}, corresponding to the rows in the diagram above. Note 
that these orbits do not depend on the choice of direction made at the start of each 
line of isogenies, nor do they depend on the choice of j\. 
Continuing with p = 263, we compute the products 

Pi (A) = p (X - 252){X - 38)(A - 151)(A - 121)(Jf - 258), 

P 2 {X) =, p (X - 70){X - 112)(X - 182)(X - 198)(A - 140), 

P 3 (X) = p (X - 202){X - 130)(A - 183)(A - 196)(A - 136). 



2 The symmetry of the li and in this small example is entirely coincidental. 
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Each 6ik is obtained as the coefficient of X k in the polynomiai Pi\ 

Pi(X) = p X 5 + 232A 4 + 32X 3 + 159X 2 + 208X + 158, 

P 2 (X) = p X 5 + 87X 4 + 252X 3 + 139A 2 + 103A + 21, 

P 3 (X) = p X 5 + 205A 4 + 86X 3 + 113A 2 + 121 X + 116. 

We then set y\ = 0u = 232, y 2 = 6 24 = 87, and y 3 = 834 — 205. Using the values 
yi and the 9ik, we compute 

V(Y) = (Y - yi )(Y - y 2 )(Y - y 3 ) = p Y 3 + 2 Y 2 + 104 Y + 59, 

W oOO = £ lO V(Y)/(Y - Vi ) = p 32 Y 2 + 259Y + 152, 

W 1 (Y) = Y,6 il V(Y)/{Y-y i ) = p 169 Y 2 + 41 Y + 153, 

W 2 (Y) = l2 V(Y)/(Y - y t ) = p 148 Y 2 + 117 Y + 277, 

W 3 (Y) = ^e i3 V(Y)/(Y - Vi ) = p 107y 2 + 115y + 244, 

We do not need W4 because yi = 6i, n -i implies W„_i(j/j) = yiV'(yi), hence the 
coefficient of X n ~ 1 in U(X,yi) is just yi. We complete our work for p = 263 by 
updating the CRT coefficient data (mod q) for the 15 nontrivial coefficients above, 
using [42, Alg 2.4]. The same procedure is then applied for each p € S. 

Having processed all the primes in S, a small postcomputation step [42, Alg 2.5] 
yields V and the Wk modulo q: 

V(Y) = q Y 3 + 947907 Y 2 + 829791 Y + 760884, 

W (Y)= q 975377 Y 2 + 130975 Y + 363724, 

Wi(F) = q 240332 Y 2 + 135971 Y + 616131, 

W 2 (Y) = q 126738 Y 2 + 479879 Y + 908580, 

W 3 (Y) = q 340801 Y 2 + 1000285 Y + 68659. 

The roots of V mod q arc y 1 = 336976, y 2 = 898530, and y 3 = 904088, none of 
which are roots of V' mod q. Using y = y\ we compute 

U(X,y) = X 5 + yX 4 + ^^{w 3 {y)X 3 + W 2 {y)X 2 + W^X + W (yj), 

= q X b + 336976A 4 + 556976A 3 + 849678A 2 + 363260A + 95575, 

and we then find that x — 590272 is a root of U(X, y), and hence of Hd, modulo q. 

This completes the execution of Algorithm 1 on the inputs D = —971 and 
q = 1029167. If we now set k = a;/ (1728 - x) = q 638472, then the elliptic curve 
E/F q defined by 

Y 2 = X 3 + 3kX + 2k = q X 3 + 886249A + 247777 

has complex multiplication by the quadratic order with discriminant —971. 

3.2. Complexity. The running time of Algorithm 1 has have four principal com- 
ponents. Let us define Tfi n d, T cnum , and Tbuiid (respectively) as the average ex- 
pected time, over p e S, to: find ji e Ello(F p ) (step 5a), enumerate the G-orbits 
of Ello(Fp) (step 5b), and build the polynomials V and Wk modulo p (steps 5c 
and 5d) . Additionally, let T roo t be the expected time to find a root y of V mod q 
and to compute and find a root of U (X, y) mod q (steps 7 and 8). As shown in [42], 
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the cost of the precomputation in step 4 and the total cost of all CRT computations 
are negligible, as are steps 1-3. Thus the total expected running time is 

(10) 0(\S\(T Rnd + 

^cnum + Tbuiid) + ^root ) • 

When G is trivial, or equal to cl(O), Algorithm 1 reduces to the standard CM 
method, computing Hjj mod q as in [42, Alg. 2]. In this case, under the GRH we 
have the following bounds: 

|5| =0(|£| 1 / 2 loglog|,D|), 

Tfind =0(Mog 5+e /i), 

T CDUm = 0(hlog 5+e h), 

Tbuiid =0(Mog 3+e /i), 

Troot = 0(h\ogh\og 2+e q). 

The first four bounds are proved in Lemmas 7 and 8 of [42]. 

The last bound is based on the standard probabilistic root-finding algorithm of 
[8, §7], using fast arithmetic in F p [a;]. With Kronecker substitution, multiplying 
two polynomials of degree d in F q [x] uses 0(M(d\ogq)) bit operations, and this 
yields an 0(M(d\ogq) logq) bound on the expected time to find a single root of 
a polynomial in F 9 [a;] of degree d. Here M(n) denotes the time time to multiply 
two n-bit integers [45, Def. 8.26], which we assume to be superlinear, and which 
satisfies the Schonhage-Strassen bound M(n) = O(nlognloglogn); see [39]. 

When n = \G\ properly divides the class number h, the times for T build and T root 
may change; these are analyzed below. The times Tfi n d and T onum are independent 
of the choice of G, as is the space complexity. Depending on the bound B, the size 
of S may also be reduced. This can lead to a substantial practical improvement, 
depending on the choice of G and the value of s, but we defer this issue to §5. For 
the moment we simply note that the bound on \S\ above holds for any choice of G 
and for every s. 

We now consider the time Tbuiid- Computing the Oik m step 5c via (5) involves 
building m polynomials Pi of degree n in F p LY] as products of their linear factors. 
Let M(n) denote the time to multiply two n-bit integers [45, Def. 8.26], which we 
assume to be superlinear. With Kronecker substitution, multiplying two polyno- 
mials of degree n in F p [x] uses 0(M(nlogp)) bit operations. Using a product tree 
for each Pi yields the bound 3 

(11) 0(mM(nlogp)logn) C 0(Mog 2 n\og 1+e p) 

on the cost of computing the Oik- Here we have used 4p > \D\ > h > n and the 
0(n log n log log n) bound [39] for M(n), which we note applies to the algorithms 
used in our implementation. The cost of computing the yi is also dominated by the 
time to build m polynomials of degree n as products of their linear factors: for each 
G-orbit Gi we compute the polynomial YljeGii-^ ~ j) *= -^[Jf] whose coefficients 
arc the values of the symmetric functions ei,...,e„ on d (up to a sign), from 
which we compute the linear combination yi = s. 

The cost of building V as a product of its linear factors is 0(M(mlogp) logm), 
which is dominated by the cost of computing the n polynomials Wk as linear com- 
binations of V(Y)/(Y — yi) with coefficients Oik- Using a recursive algorithm to 



^Our bounds count bit operations and hold for all constants e > (and often for e = o(l)). 
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compute each Wk as in [45, Alg. 10.9], we obtain a total cost of 

(12) 0(nM(ralogp)logm) C 0(h\og 2 m\og 1+e p) 

for each iteration of step 5d. The sum of (11) and (12) is 0(h log 2 /ilog 1+c p), 
essentially the same as the cost of building Ho mod p from its linear factors. There 
is an improvement in the implicit constants, but asymptotically we gain at most a 
factor of 2 in the time T bui i d . 

We may gain much more in the time Troot ■ 

The total expected time for steps 7 

and 8 of Algorithm 1 is bounded by 

(13) T root = 0(M (m log q) log q + hM (log q) + M (n log q) log q). 

The three terms in (13) reflect the time to: (a) find root y of V mod q that is not 
a root of V mod q, (b) compute U V (X) = U(X,y) mod q 7 and (c) find a root of 
U y mod q. In (a), any common roots of V and V are first removed from V via 
repeated division by the GCD, which takes negligible time. We may bound (13) by 

(14) T root - 0(h log 1+e q + (to + n) log h log 2+e q), 

improving T root by a factor of min(log/ilogg,TOn/(m + n)) compared to the time 
to find a root of Ho mod q. This can reduce the cost of root-finding dramatically, 
as may be seen in §6. 

4. A SECOND ALGORITHM 

Algorithm 1 obtains a root of Ho mod q as a root of U(X, y) mod q, where y is 
a root of V mod q, and U is defined by 

1 " 

(8) U(X,Y) = — — Y,W k (Y)X k . 

^ ' k=0 

To compute U(X, y) we need to evaluate each Wk at y £ F q . We observe that it is 
not necessary to know the coefficients of Wk to do this, we could instead use 



(7) W k (Y) = J2 



V(Y) 



provided that we know V and the Oik (which determine the y,{). 

Unfortunately we do not know the 9 ik in F q . Algorithm 1 computes the 9 ik 
in F p , for each prime p £ S, but we cannot readily export this knowledge to F q 
because the Oik do not correspond to the reductions of rational integers. 4 Indeed, 
the entire reason for using the polynomials V and Wk is that they have coefficients 
in Z and are thus defined over any field. 

However, there is nothing to stop us from using (7) to evaluate Wk in F p . Given 
any z € Z, we can certainly compute Wk(z) mod p, and if we do this for sufficiently 
many p we can apply the explicit CRT (mod q) to obtain Wk(z) mod q. 

In particular, we can apply this to a lift of y € F g = Z/qZ to Z. Explicitly, let 
ip = (j)TT, where tt is the unique field isomorphism from F q to Z/qZ and <fi maps 
each residue class in Z/qZ to its unique representative in the interval [0, q— 1]. We 
then have W k {ip{y)) = tp(W k (y)) mod q. 

Thus it suffices to compute Wk = Wk{f{y)) mod q, and this can be accomplished 
by computing Wk mod p for sufficiently many primes p. Note that while y is a root 



4 The Oik are integers of G's fixed field L C Kq. They do not all lie in Q unless G = cl(O). 
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of V mod q, when we reduce <p{y) modulo p, we should not expect to get a root 
of V mod p, nor do we need to; we are simply evaluating the integer polynomial 
Wk{Y) at the integer <p(y) 7 modulo many primes p. 

This leads to our second algorithm, which proceeds in two stages. The first 
stage computes V mod q and finds a root y. The second stage computes the values 
Wk mod q, then computes U(X, y) mod q and finds a root x. The second stage 
requires a bound on |iOfc|, for which we may use Bmq" 1 " 1 , where B bounds the 
coefficients of the Wk- For the sake of simplicity we use a single bound for both 
stages {mq m ~ x times the bound used in Algorithm 1), but in practice one may 
compute separate bounds for each stage (in stage 1 it is only necessary to bound 
the coefficients of V) . In order to achieve the best space complexity, certain steps 
are intentionally repeated, and some may require a more careful implementation, 
see §4.3 for details. 

As before, the choice of G c cl(O) in step 1 determines n = \G\ and m = h/n, 
and the values yi, Oik and Wk are defined for 1 < i < m and < k < n, where 
the yi depend on the symmetric function s constructed in step 1. 
Algorithm 2. Given D and q £ Vd, compute a root x of Hd mod q as follows: 

1. Select G, generate a random s, and compute B, as in Algorithm 1, 
then set B «- m,q m ~ 1 B. 

2. Compute a polycyclic presentation Y for c\{0). 

3. Use B to select S C Vd, and perform CRT precomputation (mod q). 5 

4. For each p £ S: 

a. Find ji £ Ello(F p ) as in [42, Alg. 1], and cache j±(p) = j\. 

b. Enumerate the G-orbits Gi of Ello(F p ) using j\ and Y. 

c. Compute the yi and V mod p. 

d. Update CRT data for V mod q. 

5. Perform CRT postcomputation to obtain V mod q. 

6. Find a root y of V mod q that is not a root of V mod q. 
If no such root exists then return to step 1. 

7. For each p £ S: 

a. Let ji = ji(p) be the element of Ello(F p ) computed in step 4a. 

b. Enumerate the G-orbits Gi of Elle>(F p ) using j\ and Y. 

c. Compute the values 6ik, yi, <p{y) and V mod p, and then use them to 
compute the values Wk = Wk((fi(y)) mod p, via the formula in (7). 6 

d. Update CRT data for the Wk mod q. 

8. Perform CRT postcomputation to obtain the Wk mod q. 

9. Compute U y (X) = U{X, y) mod q and output a root x of U v mod q. 

Computing the Wk in step 7c via (7) is faster than computing the coefficients 
of Wk, by a factor of log 2 m. For a suitable choice of G (with n — \G\ small, say 
poly logarithmic in h), this may improve the time complexity of the entire algorithm, 
as shown in §4.2. However, it is often better to choose G to optimize the bound B, 
as described in §5, which will tend to make G large. 

More significantly, computing the scalars Wk rather than the polynomials Wk 
reduces the space complexity to 0(h\ogh + (m + n)logq), which may be much 



'To optimize space this step may need to be performed in an amortized fashion, see §4.3. 
'It is possible to do this without explicitly computing V mod p, see §4.1. 
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better than the 0{h log q) space complexity of Algorithm 1 when q is large. This 
can even be improved to 0((m + n) \ogq) using a more intricate implementation, 
but this may increase the time complexity slightly. The details are given in §4.3, 
where the space complexity of Algorithm 2 is analyzed. 

4.1. Example. Let us revisit the example of §3.1, with D = —971 and q = 1029167. 
As before, the class group is cyclic of order h = 15, we let G be the subgroup of 
order n = 5, and set s = —e\ so that y^ = 0j >n _i. 

The first stage of Algorithm 2 proceeds as in Algorithm 1, except that our height 
bound is now log 2 (m<? m ~ 1 -B) w 366 bits, so we select a slightly larger set S, with 
29 primes whose product exceeds 4mq m ~ 1 B. For p = 263 we again find that the 
three G-orbits of E11 (F P ) are {252, 38, 151, 121, 258}, {70, 112, 182, 198, 140}, and 
{202, 130, 183, 196, 136}. But instead of computing all the Qik as in Algorithm 1, 
we only need to compute 

yi = p -(252 + 38 + 151 + 121 + 258) = p 232, 

2/2 = P -(70 + 112 + 182 + 198 + 140) = p 87, 

y 3 = P -(202 + 130 + 183 + 196 + 136) = p 205, 

which yields 

V(Y) = (Y- yi )(Y- y 2 )(Y - y 3 ) = p Y 3 + 2 Y 2 + 104 F + 59. 

After computing V mod p for all p € S we obtain, via the explicit CRT (mod q), 

V(Y) = q Y 3 + 947907 Y 2 + 829791 Y + 760884, 

and we again find that y = 336976 is a root of V and not V. We now lift y from 
F q = Z/qZ to the integer <p(y) = 336976 and begin the second stage. 

For p = 263 we recompute the three G-orbits as above, and this time we compute 
all of the 6ik, as we did in Algorithm 1. Setting y\ = 6u = 49, yi = #24 = 87, and 
2/3 = $34 = 205, we recompute V mod p as above. 

We now let z be the reduction of <p(y) mod p and find that z — 73 (which is not 
a root of V mod p, as expected). We then compute 

zi = V(z)/(z - 2/1) = (z - y 2 )(z - 2/3) = P 7, 

z 2 = V(z)/(z - 2/2) - {z - yi){z - 2/3) = P 211, 

z 3 = V(z)/(z - 2/3) - (z - yi)(z - 2/2) = P 122. 

The 0j can be computed in two ways: (a) evaluate V(z), invert the (z — j/i), and 
compute the Zi's as products, or (b) simultaneously compute the zi as the comple- 
ments of the (z — j/i) using a product tree, as in [42, §6.1]. Both use 0(m) field 
operations, but (b) is faster and works even when z is a root of V mod p. 

We now compute the Wk as linear combinations of the Zi with coefficients Oik ■ 

w Q = W (z) = 6 wZl + e 2a z 2 + 30 z 3 = p 227, 

w x = Wx(z) = flii^i + 021^2 + 6»3iz 3 = p 79, 

w 2 = W (z) = 12 zi + 22 z 2 + 9 32 z 3 = p 44, 

W 3 = W {z) = 01321 + 023-22 + 33 z 3 = p 242. 

When evaluated at z, the polynomials Wk modp computed by Algorithm 1 yield 
the same values Wk above. But here we obtained the Wk without computing the Wk , 
using just O(h) operations to compute them directly from the j/i and the 0^. Most 
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importantly, the CRT data used to compute the Wk mod q only consumes 0(m log q) 
space, versus O(hlogq) for the Wk mod q. 

After computing the Wk mod p for all p € 5, the explicit CRT (mod </) yields: 

w = q 180694, ioi = q 270105, w 2 = g 92440, w 3 = q 110998. 

We then evaluate V'(y) mod g and use the Wk to compute 

U(X, y) = X 5 + yX 4 + (w 3 X 3 + w 2 X 2 + Wl X + w Q ) 

= q X 5 + 336976X 4 + 556976A 3 + 849678A 2 + 363260A + 95575, 
and find that x = 590272 is a root of U(X, y), and hence of Hp, modulo q. 

4.2. Time complexity. To simplify our analysis we shall initially assume that 

(15) mlog g = 0(|T>| 1/2 log|T>|). 

Depending on m = h/\G\, this may allow \ogq to be exponentially larger than 
log \D\, but here we have in mind the case where logg is polynomial in log \D\; see 
§4.4 for an approach better suited to large q. Assuming (15), our bound on \S\ is 
the same as in our analysis of Algorithm 1 in §3.2. Under the GRH we have 

(16) IS^OOiJI^logloglDI), 

which bounds the total expected number of iterations in steps 4 and 7. 

As with Algorithm 1, the expected running time of Algorithm 2 is bounded by 

(17) 0(\S\(T iind + 

^enum + Tbuild) + T roo t) , 

where Tfi n d, T cnum , Tbuiid, and T root are as defined in §3.2. The term T nn d is the 
same for both algorithms, and the term T cnum is doubled in Algorithm 2. The 
bound 0(h log 2 h log 1+e log p) on Tbuiid in Algorithm 1 becomes 

(18) Tbuiid = 0{m log 2 to log 1+e p + h log 2 n log 1+e p) 

for Algorithm 2, with p = max S. The first term in (18) is the time to build V , while 
the second is the time to compute the Oik and Wk- For suitable n, say n = log c h 
for some c > 2, Algorithm 2 effectively reduces Tbuiid by a factor of log 2 h. There 
is also a minor improvement in T root ; the bound given in (14) becomes 

(19) T root = 0((m + n)logMog 2+e g ). 

In both the GRH-based and heuristic complexity analyses of [42, §7], the bound 
for T enum dominates the sum Tg n( j + T enU m + Tbuiid- Thus a better bound on Tbuiid 
does not improve the worst-case complexity. However, the worst-case scenario is 
atypical, and for almost all D (a set of density 1) this sum is dominated by Tbuiid- 
This is heuristically argued in [42, §7], and it can be proven using [6] and assuming 
the GRH (but we will not do so here). 

To remove the worst-case impact of T enum (and also T nn d), let us consider the 
performance of Algorithm 2 on a suitably restricted set of discriminants. Fix posi- 
tive constants c\, . . . , c 7 , and e. For a positive real parameter a, let V(a) denote 
the set of negative discriminants D with the following properties: 

(i) The set of integers p < ci\D\ log 1+£ \D\ satisfying 4p = t 2 - v 2 D with 
v > c 2 log 1/2+£/3 \D\ contains at least C3IDI 1 / 2 log e/2 \D\ primes. 
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(ii) There are at least two primes I < c 4 log 1/2 \D\ for which (f) = 1. 

(iii) There is a divisor of h = h(D) in the interval [05 log" h, exp(c6 log 3 / 4 /i)] . 

Conditions (i) and (ii) ensure that for D € V{a), both Tfi„d and T enum are bounded 
by OdDl^log 572 ^ \D\); we refer to [42, §7] for details. Provided that a > 1/2, 
for D ef(a) we can choose G so that 

C5 log 172 h < m < exp(c6 log 3 / 4 ft.) and n < h/ (C5 log 1 / 2 /i), 

where n = \G\ and ran — h. Applying (18), we see that Tbuiid is then also bounded 
by OdZ?! 1 / 2 log 5/2+e 

To ensure that our assumption in (15) is satisfied, let us define 

V D {a) = {qeV D : logq < c 7 \og 1+a \D\}. 

This definition is more restrictive than necessary, but for simplicity we impose a 
uniform bound. For q G Vd(c() we then have T root = O^D] 1 / 2 ^), which is a 
negligible component of (17). This yields the following proposition. 

Proposition 1. Assume the GRH. For all a > 1/2, D e T>(a), and q <G Td{cv) 
there is a choice of G for which the expected running time of Algorithm 2 is 
0(\D\log 5 ^\D\). 

Proposition 2. For all a < log 2, the set T>(a) has density 1 in the set of all 

imaginary quadratic discriminants. 

Proof. It suffices to show that each of the properties (i) , (ii) , and (iii) hold for a set of 
discriminants D with density 1 (in the set of all imaginary quadratic discriminants). 
For (i), this follows from [6, Thm. 2]. 

For (ii), consider just the odd primes i\, . . . ,£k less than log log \D\ < C4 log 1 / 2 \D\, 
for sufficiently large \D\. The proportion of congruence classes modulo L = Y\di 
corresponding to integers x for which (j-) 7^ 1 for all but at most one ti is equal to 

n^ + E^n^< ( 2 /3) fe + (V2)(2/3)- 1 = 0(1). 

Thus the number of discriminants in the interval [—2D, —D] that do not satisfy 
property (ii) is o(|£)|). 

For (iii), recall that for any e > and almost all integers n there are at least 
(1 — e) log log n distinct prime divisors of n; see [29, Thm. 431]. Therefore almost all 
discriminants D have at least k = (1 — e) log log \D\ — 1 distinct odd prime factors, 
and for all such discriminants, h = h(D) is divisible by 2 fc ~ 1 ; see [20, Lem. 5.6.8]. 
As shown by Siegel, log h = (1/2 + o(l)) log \D\, thus for all a < log 2 we can choose 
e > so that 2 fe ~ 1 > c 5 log" h for all sufficiently large \D\. □ 

Propositions 1 and 2 together imply that, under the GRH, for almost all dis- 
criminants D < one can find a root of Hp mod q in 0(|-D| log 5 / 2+e \D\) expected 
time, for all q £ Vd{o) with 1/2 < a < log2. 

We note that the time required to identify and select a suitable G C cl(0) is 
negligible by comparison. When D is fundamental we can obtain a set of generators 
for the class group cl(O) using ideal class representatives of prime norm bounded by 
6 log 2 |D|, under the GRH [5], and for non-fundamental D we also include ideals of 
prime-power norm for primes dividing the conductor, of which there are 0(log \D\). 
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Given a generating set S for cl(O) of size 0(log 2 \D\), we can apply generic algo- 
rithms to compute the group structure and an explicit basis for cl(O) consisting of 
elements of prime-power order, in time 0(/i 1 / 2+£ ), where h = |cl(0)| = h(D), via 
[43, Prop. 4]. 7 Once this has been done, it is easy to construct an explicit basis for 
a subgroup G of any desired order n dividing h. 

Remark 3. It is usually better to choose G to optimize the height bound B 
rather than making the choice required by Proposition 1. Heuristically, this yields 
a better improvement in the time complexity, a factor of nearly log \ D\ rather than 
log 1/2 |£>|, see Heuristic Claim 1 in 85. 

4.3. Space complexity. For convenience we assume the GRH and the restriction 
(15) on the size of q. An alternative approach with a weaker restriction on q is 
given in the next section. Under these assumptions, logp ~ log \D\ for all p E S, 
and \S\ = OGDI^logloglDQ, as in (16). A straightforward implementation of 
Algorithm 2 yields a space complexity of 

(20) 0(\S\ log \D\ + \S\ log q + ft log \D\ + (m + n) \ogq) . 

The first term of (20) represents storage for the set S, the second term is storage 
for prccomputed data used to apply the explicit CRT (mod q), the third term is 
storage for Ello(F p ), and the last term is storage for the n values Wk and the m 
coefficients of V that are computed via the explicit CRT (mod q) . The second term 
dominates, and we can immediately bound (20) by OQD} 1 / 2 log log \D\ logq), which 
is the same as the space complexity of Algorithm 1. 

However, the only essential term in (20) is the last one, which may be as small 
as 0(\D\ 1 ' A logg). We now show how to eliminate the first three terms in (20). 
In practice the most useful term to eliminate (or reduce) is the second one, which 
requires only a very minor change, but we consider each in turn. 

4.3.1. The first term. We wish to avoid storing the entire set 5* at any one time. 
Our strategy is to process S in batches of size 0(\D\ C ), for some positive c < 1/4. 

In [42] the set S is chosen by enumerating a larger subset S z C Vd defined by 
a parameter z that depends on the bound B. The primes in S z satisfy the norm 
equation 4p = t 2 — v 2 D, where v is 0(log 3+e \D\), and the bound on t depends 
on v, h(D), and z, but is in any case 0(\D\ 1 / 2+e ). As an alternative to the sieving 
approach of [42, Alg. 2.1], we enumerate S z by running through all the integers of 
the form (t 2 — v 2 D)/A, with v and t suitably bounded, and applying a polynomial- 
time primality test [2] to each. This takes 0{\D\ x / 2+t ) time and negligible space. 

Each prime p in S z is assigned a "rating" r(p), that reflects the expected cost 
of finding j\ € Ello(F p ). The details are not important here, but we may assume 
that the positive real numbers r(p) are distinctly represented using a precision of 
0(log |_D|) bits. Let r max be the largest (worst) rating among the primes in S z . For a 
suitable re [0, r max ], we then let S consist of the primes in S z with ratings bounded 
by r, where r is chosen so that S has the appropriate size. We can determine such 
an r space-efficiently using a binary search on the interval [0, r max ], enumerating S z 
at most |~log 2 \S Z \~] times. Similarly, we can partition S into batches of size \D\ C 
by partitioning the interval [0, r] into subintervals whose endpoints are determined 
using a binary search. The total cost of computing this partition is 0(|-D| 1-c+£ ), 



7 Note that the exponent lcm Qg ,g|a| of cl(O) can be computed in time 0(/i 1//2+e ); see [41]. 
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and this also bounds the cost of enumerating all the batches in S. Thus the time 
complexity is negligible and we use o space. 

4.3.2. The second term. Let M be the product of the primes pi in S. As de- 
scribed in §2.3, when applying the explicit CRT (mod q), we precompute integers 
Mi = M/pi mod q and at = (M/pi)^ 1 mod pi; this would normally be done in 
step 3 of Algorithm 2. The total size of the Mj is 0(\S\ logg), which accounts for 
the second term of (20). Rather than computing the Mi in step 3, we just compute 
M mod q in step 3, and when we need M t during a CRT update step for pi (steps 
4d and 7d), we invert pi mod q and multiply to obtain Mi — (M/pi) mod q. This 
reduces the second term of (20) from 0(\S\ logg) to 0(\S\ log \D\ + logq), which is 
the space used by the <n and M mod q. 

With the changes described in §4.3.1, we compute the primes pi in S in batches 
of size |-D| C , where c < 1/4. We now do the same with the a^, computing the 
di for each batch of primes pi as follows. If N is the product of the primes in 
the batch, then it suffices to compute the integer (M/N) modulo N, and then 
simultaneously compute the product of (M/N) and (N/pi) modulo pi, for all the 
primes pi in the batch, to obtain the dj. The a, (and the pi) for a given batch 
arc stored only as long as it takes to process the batch; this means that they are 
computed twice in Algorithm 2: once in step 4 and then again in step 7. Using a 
product tree for each batch, it takes 0(\D\ 1 ~ c+e ) time to compute the Oj for all the 
batches, which is negligible. With this change, the second term of (20) is reduced 
to o(|D| 1 /4) + 0(l gg). 

The only data that is retained once the processing of a given batch of primes has 
been completed are two values associated to each coefficient c that is to be computed 
modulo q (the m coefficients of V mod q in step 5, and the n values Wk mod q). 
These two values are the partial sums J2 c i a iMi and ^2ciCii/pi, where the first 
is computed modulo q and the latter is stored with a precision of O(logg) bits, 
as described in §2.3. These two values are updated as each prime pi is processed 
(across all the batches), and never require more than 0(log q) bits. There are a total 
of m + n coefficients, which accounts for the fourth term 0((m + n) \ogq) listed in 
(20), which also bounds the space required to perform the CRT postcomputation; 
using the accumulated partial sums, the postcomputation is effectively just a single 
subtraction to evaluate (3). 

4.3.3. The third term. Steps 4b and 7b of Algorithm 2 both enumerate the G- 
orbits of ji € Ello(Fp), using the polycyclic presentation T. Let jn~ denote the A:th 
element of the G-orbit Gi = (jn, ■ ■ ■ ,ji n ), where i ranges from 1 to m and k ranges 
from 1 to n. There are a total of h — mn values jik, and these account for the 
0(h log \D\) third term in (20), using the GRH bound logp = 0(log \D\). In order 
to reduce the space required, we need to process the jik as they are enumerated, 
rather than storing them all. Our basic strategy is to order the enumeration so 
that we compute the G-orbits one by one and discard each G-orbit once it has been 
processed (in fact, we need to enumerate the G-orbits twice in step 7 in order to 
achieve this). Depending on the presentation T, enumerating the G-orbits efficiently 
may present some complications. These will be addressed below. We first consider 
how to process the G-orbits in a space-efficient manner as they are enumerated. 

Let us recall the values we must derive from the jik- In each iteration of step 4 
we compute m values yi mod p, each of which is a linear combination of elementary 
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symmetric functions applied to Gi = (jn, . . . ,ji n )- We then compute the polyno- 
mial V mod p whose roots are the . In each iteration of step 7 we again compute 
the yi mod p, and also the coefficients Oik of the polynomials 

n n 

Pi{x) = n = ( x - fa) = e 9 * xk 

k=l k=0 

defined in (5). Up to a sign, the 6ik are just the elementary symmetric functions of 
jn, . . . ,ji n , so we can derive each yi from the Oik- The yi are then used to compute 
V mod p (again), and also the values Zj — V(z)/(z — y\), where z = tp(y) mod p is 
the reduction of the integer ip(y) corresponding to the root y of V mod q computed 
in step 6. Finally, we compute the values Wk = Y^iLi & 

described in 84.1. 

The space required by the yi, the zt, the Wk and the polynomials V and V 
is just 0((m + n)logp), which is within our desired complexity bound. We now 
explain how to process the G-orbits in a way that achieves this space complexity. 
For each Gj we compute the polynomial Pj with coefficients 0^, use the 0^ to 
compute yi, and then discard Gi and the Oik- In step 4 this is all that is required; 
once all the G-orbits have been processed we compute V(Y) — Y\™ =l (Y — yi). In 
step 7 we proceed as in step 4, and after computing V and the yi we compute V 
and the values Zj = V(z)/(z — yi). We then enumerate the G-orbits a second time, 
recompute the Oik as above, and then update each of n partial sums Wk — OikZi by 
adding the term OikZi. The space required to process a G-orbit is O(nlogp) for the 
jik and the Oik, which are then discarded, plus a total of 0((m + n)\ogp) space 
used to store the value yi and Zj, and the partial sums Wk that are retained. Thus 
we can process all the G-orbits using 0((m + n) logp) space. 

We now consider the enumeration of the G-orbits. If the polycyclic presen- 
tation T for cl(O) uses the sequence of ideal classes [li], . . . , [l r ] and the chosen 
subgroup G is generated by a prefix of T, say 

(21) G={[W],...,U), 

for some d < r, then the elements jn, . . . ,ji n of the G-orbit Gi will appear consec- 
utively in the enumeration of Elle>(F p ) obtained using T and no special processing 
is required. But the situation in (21) is very special, much more so than condition 
(9) given in §2.6. Indeed, it forces G to be trivial if r = 1. We can ensure that 
G has the form in (21) if we construct T by computing a polycyclic presentation 
for G and extending it to a polycyclic presentation for cl(O). Unfortunately, the 
norms arising in a polycyclic presentation for a proper subgroup of cl(O) may be 
very large: il(|Z?| 1//3 ) in the counterexample of [42, §5.3]. 

To address this, we compute the action of ideals with uncomfortably large norm 
by representing them as a product of ideals with small norms. Assuming the GRH, 
it follows from [16, Thm. 2.1] that every element [a] of cl(O) can be expressed in 
the form [a] = [pi • • • pt], where the pi are ideals of prime norm bounded by log c \D\, 
for any c > 2 , and t = \C log h / log log | D \ ] — O (log \ D\), for some constant G that 
depends on c. We can find such a representation in 0(h 1+e ) expected time (using 
negligible space), by simply generating random products of the form [pi • • • p t ] until 
we find [a]. 8 After precomputing such a representation in Step 2, we can compute 



This can be improved to C^/i 1 /^) time and O^ 1 / 2 ^) space using a birthday-paradox 
approach with a time/space trade-off, but we don't need to do this. 
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the action of any element of cl(O) on any element of EU (F P ) in <3(log 6+£ \D\) 
expected time, allowing us to efficiently handle arbitrarily large norms in T. 

In contrast to the first two terms, removing the third term from (20) may in- 
crease the overall running time significantly. The time complexity is increased by a 
logarithmic factor, but in return, the space complexity may be reduced by an expo- 
nential factor. We did not make this trade-off in our implementation, as the space 
complexity of 0(h\ogh + (m + n) \ogq) achieved by simply optimizing the second 
term is already more than sufficient for the range of D used in the examples of §6. 
However for larger computations, increasing the running time by a poly logarithmic 
factor in order to reduce the space by an exponential factor may be very attractive, 
especially in a parallel implementation. 

Eliminating the first three terms of (20) leads to the following proposition, which 
was summarized in the introduction. The constraints on m and q ensure that (15) 
is satisfied. 

Proposition 3. Assume the GRH, and fix S > 0. Let D be a discriminant 
with class number h = mn and let q € Vd, such that m < O^D] 1 / 2 ^ 6 ) and 
log q = 0(\D\ S log \D\). With modifications 4-3-1-3, the expected running time of 
Algorithm 2 on inputs D and q is 0{\D\ log 6+£ \D\), using 0[{m + n) \ogq) space. 

4.4. Handling large q. When q is not constrained by (15), the bound mq m ~ 1 B 
used by Algorithm 2 to determine the size of S may lead to a significant increase 
in the running time of Algorithm 2 relative to Algorithm 1, which just uses the 
bound B. To better handle large q in a space-efficient manner, we make a minor 
modification to Algorithm 2 that allows us to use the bound mqB instead. Unless 
q is extraordinarily large, this will not be significantly different than using the 
bound B. We are forced to give up the improved bound on Xbuiid in (18), but 
the resulting algorithm will still have a running time that is at worst twice that of 
Algorithm 1, and will typically be only slightly slower. 

The key is to avoid exponentiating ip(y) in F p . Instead we compute all of the 
powers y, y 2 , y 3 , . . . , y m ~ x in F g , and then lift these to integers <p(y), ■ . . , ip{y m ^ 1 ) 
in the interval [0,q — 1], which can be reduced modulo p. This is done between 
steps 6 and 7 of Algorithm 2. We now modify step 5c to compute the coefficients 
of Wk = a ikY l mod p as in Algorithm 1, and then compute values w' k mod p via 

w' k = a m _i, fe (^(y m_1 ) + o m -2,^(!/ m_2 ) H h a lik <p{y) + «o,fe, 

that we use instead of Wk = Wk(f(yj) mod p. Note that the integers w' k are not 
equal to the integers Wk(<p(y)), but we have w' k = Wk(ip(y)) mod q, which is all 
that is required, even though wu and w' k will typically be distinct modulo p. 

We may combine this approach with any of the space optimizations considered 
in the previous section. When q is very large, the third term of (20) is likely to be 
dominated by the others, so we only optimize the first two terms of (20). Provided 
log q = 0(\D\ r / 2 ), the analysis in [42, §7] then yields an upper bound on the running 
time of this modified version of Algorithm 2. 

Proposition 4. Assume the GRH. Let D be a discriminant with class number 
h = mn, and let q G Vd satisfy \ogq = 0(|£>| 1//2 ). With modifications 4-3-1-2 and 
4-4, the expected running time of Algorithm 2 on inputs D andq isO(\D\ log 5+e \D\), 
using 0((m + n) log q + h log h) space. 
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5. Height bounds 

In this section we derive an upper bound B on the absolute values of the integer 
coefficients of V and Wk defined in §2.5. More precisely, we compute a height 
bound b on the maximum bit- length of any coefficient occurring in V or Wk- This 
is used by Algorithms I and 2 to choose a set of CRT primes S whose product 
exceeds AB = 2 b + 2 . 

We first derive a general height bound 6 max that depends only on D and can 
be used with any choice of the subgroup G C cl(O) and the random symmetric 
function s = e\ + C262 + • • • + c„e„ constructed in Algorithms I and 2. Under the 
GRH, we have 6 max = O^D] 1 / 2 log \D\ loglog \D\), which is all that is needed for 
the proofs of Propositions 1-4. 

We then fix s = e\ and derive height bounds that depend not only on D, but 
also on the subgroup G. As may be seen in Table I, the actual heights can vary 
significantly with G. An optimal choice of G may improve the performance of 
both Algorithms I and 2 substantially, provided that we have a height bound that 
accurately reflects the impact of G. Heuristically, we expect to reduce 6 by a factor 
of log h/ log log h, on average, by computing a customized b for each candidate 
subgroup G and choosing the best one. Of course the optimal choice of G depends 
not only on b, but also on how the size of G impacts the complexity of building 
polynomials and finding roots, but when log q <C h the height bound is usually the 
most critical factor. 

Throughout this section we work with j-invariants, which allows us to use rig- 
orous (and quite accurate) bounds on their size. Heuristic bounds for other class 
invariants can be obtained by scaling linearly, as discussed in §6. We note that 
all the bounds we derived in this section hold unconditionally; the GRH and the 
heuristic analysis in §5.4 are only used to obtain asymptotic growth estimates. 

5.1. Height bound derivations. As in §2.5, let G = {/3\, . . . ,/3 n } be a subgroup 
of cl(O) with cosets a\G 1 . . . , a m G, so that every element of cl(0) is of the form 
Q-iPk with I < i < m and 1 < k < n. We may uniquely represent ctiPk by a primitive 
reduced binary quadratic form A^x 1 + B^xy + Ciky 2 with discriminant D. If r,/j 
denotes the complex number (—Bo- + ^/~D)/{2Aik), we have 

HD(X) = U(X-3(T tk )), 

i.k 

where j(z) is the classical modular function. We assume without loss of generality 
that ai and f3\ are the identity element, and fixi = j^n) so that [cxif3k]x — j{Tik)- 
We shall use the explicit bound 

(22) |j(r ife )| < exp (n^\D\/A lk ^ + 2114.567 

proven in [21, p. 1094], and define bik to be the logarithm in base 2 (denoted "lg") 
of the RHS of (22), and we note that bik > 0. We define the height hi(F) of a 
nonzero polynomial F(X) = ^CjX^ in C[X] by ht(F) = lgmax|cj|. We seek an 
upper bound b on max{ht(U), Ir^VFfc)} in terms of the bik- 

The largest bik is &n, since ai/?i is the identity and we therefore have An = 1. 
The bound (22) is nearly tight (one can prove a similar lower bound), and this 
implies that b\\ ~ \g(e)ir ^J\D\. This is about ten times the typical value of h, so 
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\G\ 


bits 


\G\ 


bits 


\G\ 


bits 


\G\ 


bits 


1 


1983568 


35 


1017514 


182 


639986 


910 


395909 


2 


1737305 


39 


955880 


195 


672404 


1001 


444642 


3 


1600984 


42 


959237 


210 


649274 


1155 


413905 


5 


1464042 


55 


879633 


231 


607751 


1365 


392521 


6 


1430692 


65 


841574 


273 


603539 


1430 


402990 


7 


1354754 


66 


780769 


286 


540873 


2002 


414360 


10 


1286551 


70 


877290 


330 


531985 


2145 


422627 


11 


1235548 


77 


791884 


385 


522887 


2310 


401968 


13 


1202022 


78 


760840 


390 


540120 


2730 


409766 


14 


1188816 


91 


756960 


429 


525472 


3003 


436780 


15 


1195102 


105 


773983 


455 


430383 


4290 


471475 


21 


1093207 


110 


677448 


462 


487746 


5005 


507403 


22 


962794 


130 


720919 


546 


492453 


6006 


549648 


26 


1010539 


143 


697728 


715 


452019 


10010 


756598 


30 


1006310 


154 


616795 


770 


429293 


15015 


1039684 


33 


998157 


165 


678832 


858 


437618 


30030 


1983568 



Table 1. Actual heights for various G C cl(O) with D = -221606831. 



we shall not be concerned with optimizing terms that are asymptotically smaller 
than h, which we may assume includes both m and n, and even m\gn and nlgm. 
As in §2.5, the Oik are defined via 

n n 

(23) Pi(X) = H (X - j(r ik )) = w fc > 

fe=l k=l 
where Y\™ =1 Pi = Hp- This yields the bound 

n 

(24) ht(Pi) + 

k=i 

For V(Y) = J]™ i(Y-yi), with y l = s{j(rn), . . . ,j(T in )), we assume that s is of the 
form s = ±(ei + c 2 e 2 + • • • + c„e„) with c fe > 0, and define z l — |s(2 b » 1 , . . . ,2 b "*)|. 
We then have \yi\ < Zi, with Zi > 1. Thus 

m 

(25) ht(V) < m + ^lgz 2 . 

i=l 

For the nonzero polynomials Wk{Y) — Y^Li ®ik Yli^iO^ ~ Vi)> we n0 * e * na1; 

m 

i=l iyii 
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is positive, and bounds every coefficient of W k - We have 

m 

(26) ht(W fe )<lg(^|^|2 ro J]^-) <lg(mmax|^ fe |2 m n^) 

i—l i^i 

m + lgra + max^ht(i^) + ^ lg^ ■ 



i — 1 2^2 2^2 

This expression does not depend on k, thus it applies to every nonzero Wk- 



5.1.1. A general height bound. For any choice of s = e\ + c 2 e 2 + • • • + c„e„ with 
< c k < 2m 2 , we have z, < 2m 2 n2 n - 1 2 b * 1+ - +b ^ , using (™) < 2™" 1 , and therefore 

n 

(27) lgz, <n + lgn + 21gm + ^6 lfe . 
From (25) we obtain 

(28) ht(V) < m + mn + m\gn + 2m\gm + ^b lk < 3h + 2h lg h + ^ b ik , 

i.k i,k 

where we have used h = mn > m\gn. Using the crude bounds max^ J2k^ ik — Si k ^ik 
and J2i^i^S z i < ^2i^S z i: an d applying (24) and (27) to (26), we obtain 

(29) ht(Wfc) < m + lgm + n + mn + mlgn + 2m\gm + 2 ^ b. 



'ik 

k 



< 5h + 2hlgh + 2j2 b ik- 

i,k 

Since the bound in (29) dominates the bound in (28), we define 

(30) b max = hh + 2h\gh + 2^b ik , 

i.k 

where i runs from 1 to m and k runs from 1 to n. 

Recall that, under the GRH, we have the bounds h = O^D^/ 2 log log \D\) and 
Ej.fc l / A ik = 0(log|D| log log \D\), as noted in §2.4. These imply 

(31) 6 max = 0(\D\^ 2 log \D\ log log \D\). 

5.1.2. Optimized height bounds. We now fix s — e±, which implies zi — Y^ k =i^ik- 
This choice of s minimizes our height bound and simplifies the calculations. 
We have 

n 

(32) lg z t < lg (J^ 2h,fe ) < lg n + max b lk , 

fc=i 

and from (25) we find that 

m 

(33) ht(U) < m + mlgn + \J max6jfe. 

— ' k 



i=l 
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Applying (24) and (32) to (26) yields 

ht(Wfc) < lgm + m + ma,x(n + ^ b lk + J^(lgn + max 6^)) 

k i^i 

< lgm + in + n + mlgn + maxf b ik + max 6^ — max 6 ife J 

2 \ k k / 

k i 

< lgra-hm-hn + mlgn + max + max ( 6^ — max ) , 

i k 

where i runs from 1 to m and A: runs from 1 to n. This bound dominates the bound 
in (33), so it bounds ht(V) as well as ht(W k ). Thus we define 

(34) 6 = lgTO + TO + n + mlgn + max&ifc + maxf 6^ — max b ik ) 

k i \ k J 

i k 

as our height bound for G, which we typically round up to the nearest integer. 



5.2. Example. Returning to our example with D = —971 and h(D) = 15, let 
us compute b for the subgroup G C cl(O) of order n — 5. We can use the same 
polycyclic presentation [Ii], [[ 2 ] for cl(O) as before, where the ideals (i and \<i have 
norms i\ — 3 and £ 2 — 5 and [li] generates G, but now we compute directly in the 
class group, using composition of binary quadratic forms [14] rather than computing 
isogenies. In the notation of §5.1, we have j3 k = and on = [t 2 -1 ]- Enumerating 

cl(O) yields the approximate values 

bu = 141.23, &i2 = 47.08, b 13 = 15.75, b u = 15.75, 6i 5 = 47.08, 
6 21 = 28.25, 6 22 = 11.45, 6 23 = 20.18, 6 24 = 11.96, 6 25 = 11.45, 
631 = 11.45, 632 = 11-96, 6 33 = 20.18, 634 = 11.45 , 6 35 = 28.25, 

where each row corresponds to a coset of G. The value of 6 22 , for example, is 
computed using A 22 = 15, since a 2 /3 2 = [l 2 (i] is represented by the reduced binary 
quadratic form 15X 2 + 13XY + 19Y 2 , and we have 

6 22 = lg(exp( 7 rv / 97T/15) + 2114.567) w 11.45. 

To compute 6 we just need the sum Si and maximum ti of b ik over k. These can be 
computed during the enumeration and stored using 0(m log | D |) space. We have 
si = 266.89, s 2 = 83.28, s 3 = 83.38, and h = 141.23, t 2 = 28.25, t 3 = 28.25. The t t 
sum to 197.73, the maximum of Si — ti is 125.65. Applying (34) yields 

6«lg 3 + 5 + 3 + 3 lg 5 + 197.73 + 125.65 < 340, 

and our height bound is 340 bits. 

As noted earlier, we may compute V and the W k over Z using Algorithm 1, by 
computing them modulo some q > 2 b+1 . We find that max{ht(F), ^(W^)} is in 
fact about 324, within five percent of our computed bound 6. 

For comparison, if we instead choose G to be the subgroup of order 3, we obtain 
si = 165.15, s 2 = 55.45, s 3 = 78.71, s 4 = 78.71, s 5 = 55.45, and t Y = 141.23, 
t 2 = 28.25, t 3 = 47.08, t A = 47.08, t 5 = 28.25. The U sum to 291.89, the maximum 
of Si — ti is 31.63, and (34) becomes 

6 w lg 5 + 3 + 5 + 5 lg 3 + 291.89 + 31.63 < 342. 
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If we make G trivial, we have t% — s% — bn 1 the sum of the U is 421.51, the maximum 
of Si — ti is zero, and we get b — 438, which is nearly the same as the value 434 one 
obtains from [42, Lemma 8], which uses a more careful analysis than we do here. 

5.3. Computational complexity of optimizing the height bound. We can 

compute a basis (71, ... , 7 r ) for the class group cl(O) in 0(|Z)| 1 / 4+£ ) time and space, 
under the GRH [14, Prop. 9.7.16]. We may assume that each 7, has prime power 
order l e > . We then consider subgroups G of the form 

(35) G=( 7 ?)x...x( 7 ^ 

with < di < ej, which includes subgroups of every possible order n dividing 
h = h(0). There are at most h such subgroups G, and enumerating the cosets of 
G takes 0(h log 1+e h) time, using fast composition of forms [38]. 

Thus we can compute a height bound b for every subgroup G of the form in (35) in 
0{h 2 log 1+e h) time. This is 0(\D\ log 1+e \D\) under the GRH, which is dominated 
by our bounds for the running times of Algorithms 1 and 2. In fact, there are only 
0(h e ) distinct orders that can arise among the candidate subgroups G, and if we 
restrict our attention to subgroups of the form in (9), there may be even fewer G 
to consider. In practice, the time spent optimizing b is completely negligible (and 
well worth the effort in any case). 

As noted in the example, we only need 0(m log |D|) space to compute the height 
bound for a given subgroup G, which is within the complexity bound of Proposi- 
tion 3. 

5.4. Heuristic analysis. Ignoring the minor terms in (34), the value 

m n 

b* = max b ik + max(y^ b ik - ma,xb it ?j 
»=i k 1 k=i k 

closely approximates b. When m — h or n — h we have b* = J2i k ^ik, and in any 
case b* is never greater than this sum. As with 6 max , this yields an asymptotic bound 
for b* of 0(|L>| 1 / 2 log| J D|loglog|L>|), under the GRH, which then also bounds b. 
This is all that can be said in general, since h could be prime. 

But h is rarely prime. This can occur only when \D\ is prime, and even then 
it is unlikely (by Cohen-Lenstra [17]). Let us consider the typical situation, where 
we are more or less free to choose the size of G, at least up to a constant factor. 9 
Of course 6* depends on the particular choice of G, not just its order n, but to 
simplify matters we focus on n, and proceed to derive a heuristic estimate for b* as 
a function of n and m = h/n. 

Let us assume that the cosets Gi of G are ordered so that min^ An- is increasing 
with i (thus Gi = G contains the identity element ai/?i with An — 1). As a 
heuristic, let us suppose that, on average, we have 

i,k i,k t=l 



9 Under the random bisection model, a random integer N in some large interval will have prime- 
power factors whose logarithms approximate a geometric progression [4]. One then has divisors 
of N in most intervals of the form [M,cM] C [1,N], for a suitable constant c. We hcuristically 
assume that the same applies to h. 
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That is, we view ^ 1 /Aik as an approximation to a harmonic sum in which the terms 
corresponding to the zth coset of G appear at positions i, i+m, . . . , i+(n— l)m. This 
heuristic is based on empirical data collected during the construction of Table 2, 
which involved analyzing the subgroups of more than 10,000 distinct class groups 
cl(O), with discriminants ranging from 10 5 to 10 16 . We should emphasize that 
for any particular choice of cl(O) and G, the actual situation may deviate quite 
significantly from this idealized scenario, but if one averages over a large set of 
class groups and a large sample of their subgroups, one finds, for example, that 
the average rank of min^ among all the is approximately i, and we note 
that the approximation J2i k lt~Z ~ l°g^ ls correct to within an O (log log h) factor, 
under the GRH. In any case, our primary justification for this heuristic is that it 
yields predictions that work well in practice. 
Applying the heuristic yields 

> max — — = > - ss log m, 

^ k ^ fr{ * 



and 



n-l 



max 



logn 



: ( / max — ) = > — ; 

which implies that b* is within a constant factor of 

(lcgm + ^ \D\W. 
\ m ) 

This suggests that if we wish to minimize b* , then we should make n exponentially 
larger than m. If we let m w log h and n = h/m w h/logh, then we expect 
to have b* = O^D] 1 / 2 log log \D\), improving our worst-case bound by a factor of 
log | Z? |, and improving the average case, where ^ k -j— w log/i, by a factor of 
Iog|£>|/IogIog|D|. 

Using m = \ogh allows us to satisfy the bound (15) used to analyze Algorithm 2 
whenever \ogq — O^D] 1 / 2 ), which is a very mild restriction. This choice of m 
precludes the improvement attained by Algorithm 2 under Proposition 1, since it 
makes n too big, but it does lead to the following claim. 

Heuristic Claim 1. Assuming \ogq = 0(|D| C ) for some c < 1/2, the average-case 
running time of both Algorithms 1 and 2 using s = e\ is 0(|Z?| 1//2 log 2+c |D|). 

Empirical data supporting the heuristic analysis above can be found in Table 2. 
Each row of the table gives data for 1000 fundamental discriminants of approxi- 
mately the same size. We note that for the choice of G that minimizes b, the num- 
ber of cosets m is quite close to log h, on average, as expected. The two rightmost 
columns list, respectively, the average and best-case improvement achieved by opti- 
mizing the height bound. The growth rate is consistent with the Q(log/i/loglog/i) 
prediction. In practice, the actual speedup is substantially better than the height- 
bound improvement would suggest, for reasons that will be explained in the next 
section where we analyze practical computations that amply demonstrate the ben- 
efit of optimizing the height bound. 
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N 


h 


n 


TO 


h 




bh/b n 


maxbh/bn 


10 5 


147 


39 


3.7 


7626 


4486 


1.7 


3.0 


10 6 


459 


99 


4.6 


28387 


14892 


1.9 


3.6 


10 7 


1470 


254 


5.8 


105184 


48667 


2.2 


4.2 


10 8 


4632 


671 


6.9 


377174 


157603 


2.4 


4.9 


10 9 


14640 


1740 


8.4 


1339636 


509688 


2.6 


5.5 


10 10 


46434 


4849 


9.6 


4709013 


1644023 


2.9 


5.6 


IO 11 


146598 


14777 


9.9 


16338099 


5374105 


3.0 


6.1 


IO 12 


462979 


41189 


11.2 


56202741 


17182753 


3.3 


6.6 


10 13 


1460465 


114560 


12.7 


191932881 


54720882 


3.5 


7.1 


IO 14 


4644982 


377059 


12.3 


656497242 


179083436 


3.7 


7.4 


10 15 


14608895 


964998 


15.1 


2211596515 


560192565 


4.0 


9.2 


IO 16 


46276481 


2695634 


17.2 


7462636834 


1742205583 


4.3 


9.2 



Table 2. Height bound optimization 

Each row summarizes data collected for the first 1000 fundamental discriminants \D\ > N. 
The value bh is the unoptimized height bound, corresponding to \G\ = h, while 6„ is the 
optimized height bound, attained when \G\ = n. Bars denote mean values. 



6. Computational results 

This section presents performance data and computational results. In order to 
handle a wider range of discriminants, and to give the most practically relevant 
examples, we use class invariants derived from various modular functions to which 
the CRT method has been adapted. These include, among others, the Weber f- 
function, double ^-quotients, and the Atkin functions Ajy. We refer to [24, §3] 
for definitions of these invariants and a detailed discussion of their implementation 
using the CRT method. Here we briefly summarize some key properties of the class 
invariants we use. 

6.1. Class invariants. Let O = Z[t] be an imaginary quadratic order with dis- 
criminant D, for some r in the upper half-plane. The j-invariant j(r) is a root of the 
Hilbcrt class polynomial Hp and generates the ring class field Kq. Let f(z) by a 
modular function of level N related to j(z) by ^ f (f (z) , j (z)) = 0, where tyf(F, J) 
is a polynomial with integer coefficients. The value /(r) is an algebraic integer, 
and when /(r) lies in Kq we call it a class invariant. 10 A given modular function 
typically yields class invariants only for a restricted subset of discriminants; for 
example, the primes dividing N must not be inert in Q(V^D). 
We then define the class polynomial Hu[f] by 

H D [f](X)= [] (X-[a]f(r)). 

aed(O) 

For the functions we consider, Hd[J] has integer coefficients, and the techniques 
we have developed to find a root of Hd mod q apply equally well to Ho[f] mod q. 
Having found a root /o of Ho[f], we may obtain a root jo of Hd as a solution 



'We do not require /(r) to generate Kq; wc can obtain a generator as a root of ^/(/(t), Y). 
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to tp(Y) = ^>f(fo,Y) = 0. Since the degree of does not depend on D or q, we 
may bound it by 0(1), where the implicit constant depends on /. Thus deriving 
jo from /o takes just 0(log 2+e q) time. 

6.2. Heuristic height bounds. The key reason to consider alternative class in- 
variants is that Hd[J] may have much smaller coefficients than iJp. Let us define 
the height factor of / as c(f) = dcg F ^ / / degj^f. Asymptotically, we have 

ht(^[/]) = ^ + 0(l), 

where the constant c(f) may be as large as 72. If b bounds the height of Hrj, we 
regard b/c(f) as an approximate bound on the height of Hu[f], but add a small 
constant (say 256 bits) to account for the 0(1) term. We treat the optimized height 
bound b computed in §5 in the same way. 

This heuristic approach may, in rare cases, yield a bound that is too small. In 
practice this is easy to detect. The correct polynomial V(Y) must split completely 
into linear factors in F g [Y], and any sort of random error is extremely likely to 
yield a polynomial that does not. Verifying that V(Y) splits into linear factors can 
easily be incorporated into the root-finding step at no additional cost. 11 If V(Y) 
is found to be incorrect, we may then either retry with a larger height bound, or 
simply revert to / = j and use the rigorous bound proven in §5. 

In most practical applications of the CM method, we seek an elliptic curve E/F q 
with prescribed order TV, where the prime factorization of ./V is known (or provision- 
ally assumed). In this situation we can test whether we have constructed a suitable 
curve in time 0(log 2+e q), via [42, Lemma 6], which is negligible. Although it is 
usually unnecessary one can also verify the endomorphism ring of the constructed 
curve, provided that we know the factorization of the integer v in the norm equation 
4q = t 2 — v 2 D. Using the algorithm in [11, Alg. 2], this takes time subexponential 
in log \D\, under heuristic assumptions, which is also negligible. 

6.3. Implementation. Our tests were performed on a small network of quad-core 
AMD Phenom II 945 CPUs, each clocked at 3.0 GHz. The computation of class 
polynomials (or decompositions thereof) was distributed across up to 48 cores, 
depending on the size of the test, with essentially linear speedup, while all root- 
finding operations were performed on a single core. For consistency we report total 
CPU times, summed over all threads. 

The software was implemented using the gmp [27] and zn_poly [30] libraries, with 
the gcc compiler [25]. Polynomial arithmetic modulo the small primes p € S was 
handled via zn_poly, while polynomial arithmetic modulo large primes q used the 
cache- friendly truncated FFT approach described in [31], layered on top of the gmp 
library. In order to simplify the implementation, when selecting the subgroup G to 
optimize the height bound, only subgroups of the form (9) in §2.6 were considered. 
Additionally, of the various space optimizations described in §4.3 that may be 
applied to Algorithm 2, only the changes necessary to achieve a space complexity 
of 0(h log h+(m+n) \ogq) were used (see §4.3.2). A more complete implementation 
would improve some of the results presented here. 



The first step of the standard root-finding procedure computes the polynomial 
gcd(Y 9 — Y, V(Y)) whose degree is the number of distinct roots of V; duplicate roots can be 
accounted for by taking geds with derivatives of V. 
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Example 1 


Example 2 


Example 3 


Discriminant \D\ 


13569850003 


11039933587 


12901800539 


Field size \lgq~] 


177 


231 


172 


Class number h 


20203 


11280 


54076 


Presentation , . . . , f k k 


720203 


17 1128 ,19 10 


327038 52 


Modular function / 


A 71 


A i7 


A n 


Height factor c(/) 


36 


24 


36 


Standard 








Subgroup size \G\ = h 


20203 


11280 


54706 


Height bound bh 


63127 


56631 


151939 


Number of primes \S\ 


1993 


1783 


4477 


Tfind (ms) 


48 


110 


42 


T on um (ms) 


33 


48 


23 


Tbuild (ms) 


15 


7 


63 


T poIy (s) 


197 


295 


597 


Toot (s) 


56 


54 


171 


Ttot (s) 


253 


347 


768 


Accelerated 








Subgroup size \G\ = n 


227 


1128 


2458 


Height bound b n 


35115 


30957 


50180 


Number of primes \S\ 


1115 


994 


1519 


Tflnd (ms) 


44 


105 


28 


Tcnum (ms) 


33 


47 


23 


Tbuild (ms) 


6 


3 


22 


Tpoly (S) 


95 


155 


118 


Toot (s) 





4 


5 


Tft (s) 


95 


159 


123 



Table 3. Example CM constructions 



As noted in Remark 2, in our implementation we fixed s = e\. This choice 
of s worked in every large (\D\ > 10 6 , logg > 160) example that we tested, which 
included more than a million different combinations of D and G. We conjecture 
that s = e\ always works when using j-invariants, but note that it can fail for other 
class invariants in rare cases (the handful of exceptions we found all involved very 
small discriminants, and in each such case switching to s = e 2 worked). 

6.4. Accelerated CM computations with Algorithm 1. We applied Algo- 
rithm 1 to several examples that have previously appeared in the literature. The 
examples in Table 3 are taken from [42, Table 2] where they appear as repre- 
sentatives of a large set of computations to construct elliptic curves suitable for 
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Standard Accelerated 



1 n\ 
\ u \ 


I L 


ft 


hi. Ih 


T i 

poly 


T 
root 


poly 


T 

± root 


6961631 


5000 


250 


3.63 


1.0 


25 


0.2 


0.7 


23512271 


10000 


250 


3.65 


3.9 


58 


0.8 


0.7 


98016239 


20000 


625 


4.10 


21 


126 


3.5 


2.1 


357116231 


40000 


625 


4.82 


90 


282 


11 


2.2 


2093236031 


100000 


2500 


4.93 


750 


812 


88 


11 


8364609959 


200000 


4000 


6.34 


3590 


1805 


301 


16 


17131564271 


300000 


6250 


6.47 


9070 


2890 


708 


19 


30541342079 


400000 


12500 


6.31 


16900 


3910 


1380 


71 


42905564831 


500000 


15625 


7.11 


28300 


4410 


2300 


85 


170868609071 


1000000 


25000 


7.06 


123000 


9260 


8840 


159 



Table 4. CM constructions using the Weber f- function and q rts 2 256 



pairing-based cryptography. These examples are also used in [24, Table 1] with 
the class invariants we use here. 12 The first five discriminants in Table 4 originally 
appeared in [21, Table 1], and can also be found in [42, Table 4] and [24, Table 2]. 
The remaining discriminants are from [42, Table 4]. 

The time T po iy listed in Tables 3-6 is the total time spent computing the poly- 
nomial V and the polynomials Wk, in the case of Algorithm 1 (steps 1-6), and 
the total time spent computing the polynomial V and the values Wk in the case of 
Algorithm 2 (steps 1-5 and 7-8), including all precomputation. The time T root is 
the time spent on root-finding operations (steps 7-8 in Algorithm 1 and steps 6 and 
9 in Algorithm 2). For the smaller examples, these are averages over 10 runs; with 
the large examples there is very little variance in the root-finding times. 

The "Standard" computations listed in Tables 3 and 4 correspond directly to 
the computations in [24], and are equivalent to running Algorithm 1 with G — cl(O). 
The "Accelerated" computations used Algorithm 1 with G chosen to minimize 
T to t, based on heuristic formulas for T po iy and T root extrapolated from empirical 
data. In most cases this minimizes the corresponding height bound, but not always; 
in the h = 100000 example of Table 4, using n = 5000 rather than n = 2500 
improves the ratio bh/b n from 4.93 to 5.84 and reduces Tpoiy by 10 seconds, but it 
increases T root by 21 seconds, so this subgroup was not chosen. 

The Tp iy times listed in Table 3 are in each case slightly greater than the quantity 
I^KTfind + T cnum + Xbuiid), due to time spent updating the CRT data in step 3e of 
Algorithm 1, which is included in T po i y . This difference is only a few percent for 
the values of q used in these examples, but becomes more significant when q is very 
large (see §6.5). 

The third example in Table 3 illustrates four ways in which Algorithm 1 can 
reduce the time required to apply the CM method using the CRT approach: 



The timings listed here for the standard computations are slightly better (about 5%) than 
those in [24] due to a more recent version of gmp. 



ACCELERATING THE CM METHOD 



31 



1. The height bound b n = 50180 is about 3 times smaller than bh = 151359, 
which reduces 15*1 similarly, from 4477 to 1519. 

2. The average time Tfi n d spent finding an element of Ello(F p ) is reduced from 
42 to 28 milliseconds, because the primes that remain in S are those for 
which it is easier to find curves in Elle>(F p ). 

3. The average time Tb u iid spent building polynomials from their roots (or 
computing linear combinations) is reduced from 63 to 22 milliseconds, be- 
cause the degrees of the polynomials involved are m = 22 and n = 2458 
rather than h = 54076. 

4. Working with polynomials of lower degree reduces the time T root spent 
finding roots dramatically: from 171 seconds to 5 seconds. 

As may be seen in Tables 3 and 4, the speedup achieved by Algorithm 1 is 
typically better than the height bound ratio bh/b n , for the reasons noted above. 
In the last example of Table 4, with discriminant D = —170868609071 and class 
number h = 1000000, computing an optimized height bound b n with n = 25000 
improves the height bound by a factor of about 7, but T po i y is reduced by nearly a 
factor of 14 and T root is reduced even more. 

The discriminant D = -170868609071 also appears in [42, Table 4], which lists 
a time equivalent to 150 CPU days on our current test platform to compute the 
Hilbcrt class polynomial Hp modulo a 256-bit prime q. Here we instead use the 
Weber f- function, with a height factor of 72, and are able to further improve the 
height bound by a further factor of 7 using a decomposition of the class polynomial 
Ho[f\- We eventually obtain a root of the original polynomial Hp mod q, and it 
takes only 2.5 CPU hours to do so, an overall speedup by nearly a factor of 1500. 

6.5. Optimizing space with Algorithm 2. As noted in Remark 2, choosing G 
to optimize the height bound may negate any performance advantage Algorithm 2 
might have over Algorithm 1. Indeed, Algorithm 1 is usually faster, due to the larger 
height bound required by Algorithm 2, and the fact that Algorithm 2 repeats the 
enumeration step in its second stage. However when q is large, Algorithm 2 may use 
much less space than Algorithm 1, which can actually lead to a better running time. 
In this scenario we use the modified form of Algorithm 2 described in §4.4, which 
makes the height bound increase negligible, and to optimize space we choose G so 
that m = h/n is approximately equal to but no larger than n — \G\. 

Table 5 compares the time and space required by Algorithms 1 and 2 for a fixed 
discriminant D = —300000504611 and increasingly large primes q w 2 k . The class 
number is h = 2 18 , and in each case we choose G so that m = n = 2 9 . In addition 
to the times Tfi n( j, Tenum, and Tbuiid listed in Table 3, we also list the average time 
T crt spent updating CRT data for each of the primes p e S. This time is negligible 
when q is of moderate to cryptographic size (say, up to 1024 bits) , but when q is very 
large, as may occur in elliptic curve primality proving [3, 36], the time Algorithm 1 
spends updating its CRT data becomes quite significant. 13 

One can see the two disadvantages of Algorithm 2 in Table 5; it requires a slightly 
larger S, and T cnum is doubled. However Algorithm 2 needs much less space for 
its CRT data, and spends negligible time updating it. In our implementation both 
Algorithms 1 and 2 use 0(h\og \D\) space for the computations performed modulo 



13 As discussed in [42, §6.3], wc should eventually transition from the explicit CRT to a standard 
CRT approach as q grows, but here lg q is still much smaller than the height bound 6. 
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0.54 


13 


1 


378315 


10013 


165 


107 
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386516 
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167 


213 


169 





5630 


3420 


1.06 


14 


1 


378315 


10013 


165 


107 


166 


225 


6700 


16510 


541 




2 


394708 


10437 


168 


214 


170 


1 


5810 


16510 


2.11 


15 


1 


378315 


10013 


165 


107 


166 


461 


9100 


81100 


1078 




2 


411902 


10859 


170 


214 
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2 


6060 
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4.21 



Table 5. Algorithms 1 and 2 with n = \G\ = 512 and q « 2 fc 
Z> = -300000504611 with ft(£>) = 262144 using A 71 . 



each prime p G 5, about 10 MB in this example, but Algorithm 1 requires (9(/i log q) 
space for its CRT data, regardless of the choice of G, whereas Algorithm 2 only 
requires 0((m + n)\ogq) space. As shown in the last two rows of Table 5, for 
q ?a 2 32768 Algorithm 1 uses more than 1 GB of CRT data, compared to about 
4 MB for Algorithm 2, which leads to a significant time advantage for Algorithm 2. 
Note that the memory required by Algorithm 2 to store its CRT data is actually 
half the size listed in Table 5, since with m = n the CRT data is evenly split across 
the 2 stages and the CRT data for the first stage can be discarded before the second 
stage begins. 

Due to its superior space complexity, Algorithm 2 is able to effectively handle a 
broader range of \D\ and q than Algorithm 1. The next section gives an example of a 
computation with \D\ w 10 15 and q ~ 10 10000 that is easily handled by Algorithm 2 
but would be impractical to compute on our test platform using Algorithm 1, or 
any algorithm that requires space proportional to the size of Hd mod q. 

6.6. Some large examples. We also tested Algorithms 1 and 2 with some larger 
discriminants, beginning with D = —1000000013079299, which has class number 
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h(D) = 10034174. This discriminant appears in [24], where it was used to construct 
an elliptic curve over a 256-bit prime field via a class invariant derived from the 
Atkin function A71 . As noted in [24] , this set of parameters was chosen so that the 
level N = 71 is ramified in Q(\/D), which allows us to work with the square root 
of the class polynomial H£>[An], reducing both the degree and the height bound 
by a factor of two. The decomposition techniques described here can be applied 
directly to the polynomial Hr>[An\, allowing both Algorithms 1 and 2 to take 
advantage of this situation. 

Table 6 gives timings for five computations that constructed elliptic curves mod- 
ulo a 256-bit prime q by obtaining a root of the polynomial ^HulAn] mod q. The 
first row corresponds to the original computation in [24]. The next two rows give 
timings for Algorithms 1 and 2 when the subgroup G is chosen to optimize the 
running time of Algorithm 1, with n = \G\ = 44399. This reduced the total CPU 
time by nearly a factor of 5, allowing the entire computation to be completed in 
less than a day of elapsed time on 48 cores. The portion of CPU time spent on 
root-finding was cut dramatically, from more than a day to under five minutes. This 
improvement is particularly helpful in a distributed implementation, as root-finding 
is not as easy to parallelize as the other steps and is most conveniently performed 
on a single CPU. 

The last two rows of Table 6 give timings for Algorithms 1 and 2 when G is 
chosen to optimize the space used by Algorithm 2. This increases the running 
time by about 15%, but requires less than 2 MB of CRT data, compared to about 
250 MB for the original computation (and Algorithm 1). This reduced the total 
memory usage from around 500 MB to about 100 MB. 

As noted in §6.5, the reduced space required by Algorithm 2 becomes critical for 
larger values of q. To demonstrate this, we performed a sixth computation with 
the discriminant D = -1000000013079299, this time using q w 2 33220 . The total 
running time for Algorithm 2 was about 5800000 seconds (including root-finding), 
just a 20% increase over the 256-bit computation, and the size of the CRT data 
was about 25 MB, yielding a total memory usage under 200 MB. The 10000-digit 
prime q and the coefficients of the constructed curve are too large to conveniently 
print here, but they are available at http://math.mit.edu/~drew. 

By contrast, Algorithm 1, and the algorithm of [24], requires more than 20 GB 
of CRT data for this example, and this data needs to be updated for every prime 
p E S. This makes it infeasible to even attempt this computation with Algorithm 1 
on our test platform, whereas Algorithm 2 was easily able to address this example. 

Finally, we performed two record-setting computations, one with h(D) > 5 • 10 7 
and the other with \D\ > 10 16 , again using the polynomial yj Hd[Ah]. First, we 
used the discriminant D = -506112046263599 with class number h(D) = 50666940 
to construct an Edwards curve of the form x 2 + y 2 = 1 + cx 2 y 2 , where 

c = 3499565016101407566774046926671095877424725326083135202080143113943636512545, 

over the 256-bit prime field F q with 

q = 28948022309329048855892746252171986268338819619472424415843054443714437912893. 

The trace of this curve is 



t = 340282366920938463463374607431768266146, 
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alg 


n 


b 

(bits) 


\S\ 


Tfind 

(ms) 


T 

± cnum 

(ms) 


Tbuild 

(ms) 


?crt 

(ms) 


T poly 

00 


-*root 

00 






21533401 


438700 


17500 


1580 


25000 


531 


19400000 


95600 


1 


44399 


8315747 


170112 


12700 


1580 


9210 


531 


4120000 


237 


2 


44399 


8344202 


150662 


12700 


3160 


8350 


5 


4190000 


237 


1 


3277 


11130011 


227504 


13700 


1580 


7180 


535 


5260000 





2 


3277 


11518641 


235482 


14400 


3170 


2510 





4780000 






Table 6. CM computations with \D\ = 10 15 + 13079299 and q w 2 256 . 



and the group order q + 1 — t is 4 times a prime. This computation took approxi- 
mately 200 days of CPU time (about 5 days elapsed time) using Algorithm 2, which 
in this case was faster than Algorithm 1. 

Next we used D = -10000006055889179 with class number h(D) = 25459680 to 
construct an elliptic curve with Weierstrass equation y 2 = x 3 — 3x + c, where 

c = 15325252384887882227757421748102794318349518712709487389817905929239007568605, 
over the 256-bit prime field F q with 

q = 28948022309329048855892746252171992875431396939874100252456123922623314798263. 
This curve has trace 

t = -340282366920938463463374607431768304979, 

and the group order is prime. This computation took about 400 days of CPU time 
(under 8 days elapsed time) using Algorithm 1, which was faster than Algorithm 2 
for this discriminant. 
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